Nebius
AE Quick Reference · Confidential
Portfolio by Rus Teston
Account Executive · Technical Cheat Sheet

Selling Nebius AI Cloud
Everything you need in one page.

GPU tiers, workload signals, discovery questions, objection responses, competitive one-liners, and proof points — so you walk into every call prepared.

The One-Sentence Pitch
Nebius is the purpose-built AI cloud — bare-metal NVIDIA GPU performance, available in hours, at the lowest total cost of ownership in the market, with dedicated Solution Architect support from day one.
NVIDIA Reference Platform Partner Nasdaq listed $700M raised SemiAnalysis Gold Medal TCO HIPAA · SOC 2 · GDPR · ISO 27001
Product Portfolio — What You're Selling
AI Cloud Core
GPU clusters + managed infrastructure for training, fine-tuning, and large-scale inference. Kubernetes, Slurm, storage, observability — all included.
Sell when customer says...
training a model need a GPU cluster running workloads at scale replacing AWS/GCP
Token Factory API
Production-ready model inference API — call top open-source models (including DeepSeek R1 optimized by vLLM) with per-token pricing. No infrastructure to manage.
Sell when customer says...
just need an API building an app on top of LLMs don't want to manage GPUs per-call pricing preferred
GPU Tier Quick Reference — Match Hardware to Workload
GB300 NVL72
Flagship
Highest throughput & TCO for the most demanding AI workloads — rack-scale, liquid-cooled. Contact sales for access.
GB200 NVL72
Flagship
Rack-scale Blackwell. Heavy foundation model training + ultra-low latency reasoning inference. Contact sales.
HGX B300
Training
Next-gen accelerated computing for complex reasoning models and large-scale pre-training workloads.
HGX B200
Balanced
Blackwell air-cooled. Ideal for reasoning LLMs, multi-modal models, and agentic AI. Self-service available.
HGX H200
Extended Mem
Extended GPU memory — predictable performance for LLM + multi-modal training and inference. Self-service.
HGX H100
Cost-Effective
Cost-effective and robust for building and serving foundation models at scale. Best entry point for new customers.
★ All connected via NVIDIA InfiniBand / Quantum-X800 for distributed training · Hopper = H-series · Blackwell = B-series
Workload Decoder — What They Say → What They Need
🧠 Training
They say "Building a model from scratch" / "Pre-training on our dataset" / "Foundation model"
They need Large GPU cluster, high-bandwidth InfiniBand, fast shared storage, Kubernetes or Slurm orchestration
Pitch AI Cloud — multi-node cluster, Soperator, 1 TB/s storage throughput
GPU rec. GB300/GB200 NVL72 or H100/H200 cluster depending on scale
Key proof Recraft trained 20B param model; Photoroom scaled training seamlessly
🔧 Fine-Tuning
They say "Adapting an existing model" / "Domain-specific tuning" / "LoRA / QLoRA"
They need Smaller GPU footprint, managed MLflow for experiment tracking, fast iteration cycles
Pitch AI Cloud — on-demand GPU instances + Managed MLflow + PostgreSQL for metadata
GPU rec. H100 or B200 on-demand; reserve when cycles are predictable
Key proof Wubble — QLoRA adaptation + 1.8s time-to-first-token; Simulacra AI 90% faster compile
⚡ Inference
They say "Serving a model in production" / "Low latency responses" / "Scaling an AI API"
They need High throughput, low latency, autoscaling endpoints, cost-per-token efficiency
Pitch Token Factory API (managed) or AI Cloud + vLLM (self-managed). Depends on control needs.
GPU rec. H200 / B200 for large models; Token Factory if they want zero infra management
Key proof Brave Search: 11M+ AI answers/day, ~100% GPU utilization on Nebius
6 Discovery Questions That Qualify Any AI Infrastructure Deal
Q1
"What AI workloads are you running today — or planning to run in the next 6 months?"
→ Identifies training vs fine-tuning vs inference; reveals scale and urgency
Q2
"What GPU infrastructure are you using right now, and what's frustrating you about it?"
→ Uncovers incumbent (AWS, GCP, on-prem) and the displacement opportunity
Q3
"How quickly do you need GPU capacity available — and what happens if there's a delay?"
→ Availability speed is a Nebius differentiator (hours vs weeks); quantify the cost of waiting
Q4
"What does your team look like on the infrastructure side — do you have DevOps managing this, or do your ML engineers handle it directly?"
→ Positions managed services and SA support; identifies complexity appetite
Q5
"Are there any compliance or data residency requirements we need to factor in — HIPAA, GDPR, EU-only?"
→ Surfaces regulated industry angle; Nebius has EU DCs (Finland, France, Iceland) + full compliance stack
Q6
"Walk me through your current GPU spend — what's in budget, and what would it take to justify a switch?"
→ Opens TCO conversation; connects to SemiAnalysis study as a third-party anchor
Top Objection Responses
Competitive "We're already on AWS / GCP — why would we switch?"
Don't fight it — add to it. "Most of our customers start exactly there. The question is whether AWS/GCP is optimized for your AI workloads specifically — or whether you're paying hyperscaler tax for infrastructure that wasn't built for ML. SemiAnalysis modeled three real AI workloads and Nebius delivered the lowest TCO across all three. We can run that model against your actual workload in a 30-minute session with one of our Solution Architects."
Commercial "Your pricing seems higher / I need to see a TCO comparison."
Lead with the SemiAnalysis study. "Fair — and we commissioned SemiAnalysis to model exactly this. Across LLM pre-training, multimodal RL research, and production inference, Nebius had the lowest TCO of any provider modeled. The difference is we maximize GPU utilization — bare-metal performance, no hypervisor overhead. We can build your specific workload into the model."
Trust "We've never heard of Nebius — are you a stable company?"
Three anchors: listing, funding, NVIDIA. "Nebius is publicly listed on Nasdaq, so our financials are fully transparent. We raised $700M led by NVIDIA and Accel — NVIDIA doesn't make that investment in a company they don't believe in. We're also an NVIDIA Reference Platform Cloud Partner — a designation held by very few providers globally. And our ISEG supercomputer ranked #19 in the world."
Technical "We're worried about vendor lock-in with a smaller provider."
Openness is a feature, not a compromise. "We're built on open standards — Terraform, Kubernetes, Slurm, standard NVIDIA CUDA. Your workloads run the same way they'd run anywhere else. The Nebius Solution Library on GitHub has all our Terraform recipes publicly available. You're never locked into a proprietary runtime or toolchain."
Trust "We need HIPAA / GDPR compliance — can you support that?"
Full compliance stack, EU data centers. "Yes — Nebius is HIPAA-, SOC 2-, GDPR-, and ISO 27001-compliant with privacy-by-default architecture and tenant-level isolation. For EU data residency specifically, we have data centers in Finland, France, and Iceland. Visit our Trust Center for the full documentation — it's built to answer security questionnaires directly."
Competitive One-Liners
vs AWS
AWS is built for general cloud. Nebius is built for AI. No hypervisor tax, no GPU availability queues, no DevOps overhead — just bare-metal performance available in hours, not weeks.
vs Google Cloud
GCP locks you into Google's proprietary TPU ecosystem. Nebius gives you the latest NVIDIA hardware on open-standard tooling — Kubernetes, Slurm, CUDA — no proprietary runtime lock-in.
vs Azure
Azure is optimized for Microsoft's enterprise stack, not ML-first workloads. Nebius maximizes Model FLOPS Utilization — performance on par with leading industry benchmarks at lower TCO.
vs CoreWeave
CoreWeave is GPU-only. Nebius is a full-stack AI cloud — compute plus managed MLOps (MLflow, Spark, PostgreSQL), storage, orchestration, and dedicated SA support included.
Proof Points — Drop These in Deals
11M+
AI-generated answers delivered daily by Brave Search on Nebius — at ~100% GPU utilization
Customer: Brave Search · Workload: Inference
5x
Lower costs compared to major providers, achieved by CentML on Nebius AI Cloud
Customer: CentML · Workload: Inference platform
20B
Model parameters trained by Recraft from scratch — comparable to DALL·E 3 with 49% preference on benchmarks
Customer: Recraft · Workload: GenAI training
90%
Faster model compilation for Simulacra AI — from 2+ hours to 10–20 minutes using Nebius H100/H200 fleet
Customer: Simulacra AI · Workload: Research training
#19
World ranking of ISEG — Nebius's own supercomputer built in Finland, demonstrating in-house infrastructure credibility
Source: TOP500 Supercomputer List
🥇
SemiAnalysis Gold Medal in GPU Cloud ClusterMAX™ Rating — lowest TCO across all three modeled AI workloads
Source: SemiAnalysis ClusterMAX study
When to Bring in Your SA
🟢
Customer asks about multi-node cluster architecture
SA needed → involves InfiniBand topology, Kubernetes/Slurm config, fault tolerance design
🟢
Technical evaluation / POC requested
SA needed → handles environment setup, benchmark design, and results validation
🟢
Customer mentions GPU count > 8 nodes or 1000+ GPUs
SA + Sales Eng needed → large-scale cluster design and dedicated onboarding required
🟢
Security review or compliance questionnaire
SA + Trust Center docs → SOC 2, HIPAA, GDPR documentation provided by SA
🟢
Customer asks about storage throughput, WEKA, or VAST Data
SA needed → storage architecture is technical; AE sets the stage, SA closes the conversation
🟡
Customer asks "what GPU should I use?"
Use this cheat sheet first — if workload is unusual or scale is large, then loop in SA
🟡
Token Factory API questions
AE can handle pricing + basic use cases; SA only if custom integration or SLA requirements arise